NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Conformal Prediction: A Data Perspective

https://doi.org/10.1145/3736575

Zhou, Xiaofan; Chen, Baiting; Gui, Yu; Cheng, Lu (January 2026, ACM Computing Surveys)

Conformal prediction (CP), a distribution-free uncertainty quantification (UQ) framework, reliably provides valid predictive inference for black-box models. CP constructs prediction sets or intervals that contain the true output with a specified probability. However, modern data science’s diverse modalities, along with increasing data and model complexity, challenge traditional CP methods. These developments have spurred novel approaches to address evolving scenarios. This survey reviews the foundational concepts of CP and recent advancements from a data-centric perspective, including applications to structured, unstructured, and dynamic data. We also discuss the challenges and opportunities CP faces in large-scale data and models.
more » « less
Full Text Available
Distributionally robust risk evaluation with an isotonic constraint

Gui, Yu; Barber, Rina; Ma, Cong (December 2024, Arxiv)

Full Text Available
Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Gui, Yu; Jin, Ying; Ren, Zhimei (September 2024, Curran Associates)

Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values. For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet a user-specified alignment criterion. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent threshold, certifying their corresponding outputs as trustworthy. Through applications to question answering and radiology report generation, we demonstrate that our method is able to accurately identify units with trustworthy outputs via lightweight training over a moderate amount of reference data. En route, we investigate the informativeness of various features in alignment prediction and combine them with standard models to construct the alignment predictor.
more » « less
Full Text Available
Conformalized Matrix Completion

Gui, Yu; Barber, Rina; Ma, Cong (December 2023, Advances in Neural Information Processing Systems)
Conformalized matrix completion

Gui, Yu; Barber, Rina Foygel; Ma, Cong (December 2023, NeurIPS)

Full Text Available
Conformalized survival analysis with adaptive cut-offs

https://doi.org/10.1093/biomet/asad076

Gui, Yu; Hore, Rohan; Ren, Zhimei; Barber, Rina Foygel (December 2023, Biometrika)

Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds for survival times with censored data. We build on recent work by Candès et al. (2023), whose approach first subsets the data to discard any data points with early censoring times and then uses a reweighting technique, namely, weighted conformal inference (Tibshirani et al., 2019), to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to lower predictive bounds that are less conservative and give more accurate information. We show that in the Type-I right-censoring setting, if either the censoring mechanism or the conditional quantile of the survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate lower predictive bounds for users’ active times on a mobile app.
more » « less
Full Text Available

Search for: All records